AITopics | return distribution

Collaborating Authors

return distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Neural Information Processing SystemsJun-23-2026, 00:41:52 GMT

The remarkable empirical performance of distributional reinforcement learning (RL) has garnered increasing attention to understanding its theoretical advantages over classical RL. By decomposing the categorical distributional loss commonly employed in distributional RL, we find that the potential superiority of distributional RL can be attributed to a derived distribution-matching entropy regularization. This less-studied entropy regularization aims to capture additional knowledge of return distribution beyond only its expectation, contributing to an augmented reward signal in policy optimization. In contrast to the vanilla entropy regularization in MaxEnt RL, which explicitly encourages exploration by promoting diverse actions, the novel entropy regularization derived from categorical distributional loss implicitly updates policies to align the learned policy with (estimated) environmental uncertainty. Finally, extensive experiments verify the significance of this uncertainty-aware regularization from distributional RL on the empirical benefits over classical RL. Our study offers an innovative exploration perspective to explain the intrinsic benefits of distributional learning in RL.

distributional rl, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Rebalancing Return Coverage for Conditional Sequence Modeling in Offline Reinforcement Learning

Neural Information Processing SystemsJun-22-2026, 22:32:06 GMT

Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of conditional sequence modeling (CSM), a paradigm that models the action distribution conditioned on both historical trajectories and target returns associated with each state. However, due to the imbalanced return distribution caused by suboptimal datasets, CSM is grappling with a serious distributional shift problem when conditioning on high returns. While recent approaches attempt to empirically tackle this challenge through return rebalancing techniques such as weighted sampling and value-regularized supervision, the relationship between return rebalancing and the performance of CSM methods is not well understood. In this paper, we reveal that both expert-level and full-spectrum return-coverage critically influence the performance and sample efficiency of CSM policies. Building on this finding, we devise a simple yet effective return-coverage rebalancing mechanism that can be seamlessly integrated into common CSM frameworks, including the most widely used one, Decision Transformer (DT). The resulting CSM algorithm, referred to as Return-rebalanced Value-regularized Decision Transformer (RVDT), integrates both implicit and explicit return-coverage rebalancing mechanisms, and achieves state-of-the-art performance in the D4RL experiments.

machine learning, reinforcement learning, trajectory, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Neural Information Processing SystemsJun-22-2026, 19:44:01 GMT

Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals for returns in both on-policy and off-policy settings.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Modeling & Simulation (0.93)

Add feedback

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Neural Information Processing SystemsJun-11-2026, 09:20:03 GMT

In the pursuit of finding an optimal policy, reinforcement learning (RL) methods generally ignore the properties of learned policies apart from their expected return. Thus, even when successful, it is difficult to characterize which policies will be learned and what they will do. In this work, we present a theoretical framework for policy optimization that guarantees convergence to a particular optimal policy, via vanishing entropy regularization and a . Our approach realizes an interpretable, diversity-preserving optimal policy as the regularization temperature vanishes and ensures the convergence of policy derived objects--value functions and return distributions. In a particular instance of our method, for example, the realized policy samples all optimal actions uniformly. Leveraging our temperature decoupling gambit, we present an algorithm that estimates, to arbitrary accuracy, the return distribution associated to its interpretable, diversity-preserving optimal policy.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Beyond Average Return in Markov Decision Processes

Neural Information Processing SystemsApr-29-2026, 10:56:40 GMT

What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL). DistRL permits, however, to evaluate other functionals approximately. We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations. These results contribute to advancing the theory of Markov Decision Processes by examining overall characteristics of the return, and particularly risk-conscious strategies.

dynamic programming, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)

Add feedback

Distributional Off-Policy Evaluation with Deep Quantile Process Regression

Kuang, Qi, Wang, Chao, Jiao, Yuling, Zhou, Fan

arXiv.org Machine LearningApr-21-2026

This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a quantile-based approach for OPE using deep quantile process regression, presenting a novel algorithm called Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE). We provide new theoretical insights into the deep quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the rigorous sample complexity analysis for distributional OPE with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that DQPOPE achieves statistical advantages by estimating the full return distribution using the same sample size required to estimate a single policy value using conventional methods. Empirical studies further show that DQPOPE provides significantly more precise and robust policy value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.

assumption 4, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2604.18143

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Neural Information Processing SystemsMar-20-2026, 16:05:20 GMT

When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

RMIX: LearningRisk-SensitivePoliciesfor CooperativeReinforcementLearningAgents

Neural Information Processing SystemsFeb-19-2026, 08:36:25 GMT

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents incomplexenvironments. Toaddress these issues, we propose RMIX, anovelcooperativeMARL method with theConditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaRfordecentralized execution. Then,tohandle thetemporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictorforriskleveltuning.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

return distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Rebalancing Return Coverage for Conditional Sequence Modeling in Offline Reinforcement Learning

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Beyond Average Return in Markov Decision Processes

1663fba7b56da1e96bed6e30546a07b0-Supplemental-Conference.pdf

0b9e57c46de934cee33b0e8d1839bfc2-Paper.pdf

Distributional Off-Policy Evaluation with Deep Quantile Process Regression

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

RMIX: LearningRisk-SensitivePoliciesfor CooperativeReinforcementLearningAgents